{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "<h1>Creating a searchable Art Database with The MET's open-access collection</h1>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using MMLSpark. We use a subset of The MET's open-access collection and enrich it by passing it through 'Describe Image' and a custom 'Image Similarity' skill. The results are then written to a searchable index."
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "source": [
        "import os, sys, time, json, requests\nfrom pyspark.ml import Transformer, Estimator, Pipeline\nfrom pyspark.ml.feature import SQLTransformer\nfrom pyspark.sql.functions import lit, udf, col, split"
      ],
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "execution_count": 3
    },
    {
      "cell_type": "code",
      "source": [
        "VISION_API_KEY = os.environ['VISION_API_KEY']\n",
        "AZURE_SEARCH_KEY = os.environ['AZURE_SEARCH_KEY']\n",
        "search_service = \"mmlspark-azure-search\"\n",
        "search_index = \"test\""
      ],
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "execution_count": 4
    },
    {
      "cell_type": "code",
      "source": [
        "data = spark.read\\\n  .format(\"csv\")\\\n  .option(\"header\", True)\\\n  .load(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/metartworks_sample.csv\")\\\n  .withColumn(\"searchAction\", lit(\"upload\"))\\\n  .withColumn(\"Neighbors\", split(col(\"Neighbors\"), \",\").cast(\"array<string>\"))\\\n  .withColumn(\"Tags\", split(col(\"Tags\"), \",\").cast(\"array<string>\"))\\\n  .limit(25)"
      ],
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "execution_count": 5
    },
    {
      "cell_type": "markdown",
      "source": [
        "<img src=\"https://mmlspark.blob.core.windows.net/graphics/CognitiveSearchHyperscale/MetArtworkSamples.png\" width=\"800\" style=\"float: center;\"/>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "source": [
        "from mmlspark.cognitive import AnalyzeImage\nfrom mmlspark.stages import SelectColumns\n\n#define pipeline\ndescribeImage = (AnalyzeImage()\n  .setSubscriptionKey(VISION_API_KEY)\n  .setLocation(\"eastus\")\n  .setImageUrlCol(\"PrimaryImageUrl\")\n  .setOutputCol(\"RawImageDescription\")\n  .setErrorCol(\"Errors\")\n  .setVisualFeatures([\"Categories\", \"Tags\", \"Description\", \"Faces\", \"ImageType\", \"Color\", \"Adult\"])\n  .setConcurrency(5))\n\ndf2 = describeImage.transform(data)\\\n  .select(\"*\", \"RawImageDescription.*\").drop(\"Errors\", \"RawImageDescription\")"
      ],
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "execution_count": 7
    },
    {
      "cell_type": "markdown",
      "source": [
        "<img src=\"https://mmlspark.blob.core.windows.net/graphics/CognitiveSearchHyperscale/MetArtworksProcessed.png\" width=\"800\" style=\"float: center;\"/>"
      ],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "source": [
        "Before writing the results to a Search Index, you must define a schema which must specify the name, type, and attributes of each field in your index. Refer [Create a basic index in Azure Search](https://docs.microsoft.com/en-us/azure/search/search-what-is-an-index) for more information."
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "source": [
        "from mmlspark.cognitive import *\ndf2.writeToAzureSearch(\n  subscriptionKey=AZURE_SEARCH_KEY,\n  actionCol=\"searchAction\",\n  serviceName=search_service,\n  indexName=search_index,\n  keyCol=\"ObjectID\"\n)"
      ],
      "metadata": {
        "scrolled": false
      },
      "outputs": [],
      "execution_count": 10
    },
    {
      "cell_type": "markdown",
      "source": [
        "The Search Index can be queried using the [Azure Search REST API](https://docs.microsoft.com/rest/api/searchservice/) by sending GET or POST requests and specifying query parameters that give the criteria for selecting matching documents. For more information on querying refer [Query your Azure Search index using the REST API](https://docs.microsoft.com/en-us/rest/api/searchservice/Search-Documents)"
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "source": [
        "url = 'https://{}.search.windows.net/indexes/{}/docs/search?api-version=2019-05-06'.format(search_service, search_index)\nrequests.post(url, json={\"search\": \"Glass\"}, headers = {\"api-key\": AZURE_SEARCH_KEY}).json()"
      ],
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "execution_count": 12
    },
    {
      "cell_type": "code",
      "source": [
        "# "
      ],
      "metadata": {},
      "outputs": [],
      "execution_count": 13
    }
  ],
  "metadata": {
    "kernelspec": {
     "display_name": "Python 3",
     "language": "python",
     "name": "python3"
    },
    "language_info": {
     "codemirror_mode": {
      "name": "ipython",
      "version": 3
     },
     "file_extension": ".py",
     "mimetype": "text/x-python",
     "name": "python",
     "nbconvert_exporter": "python",
     "pygments_lexer": "ipython3",
     "version": "3.6.3"
    }
   },
   "nbformat": 4,
   "nbformat_minor": 2
}